Proposal and Evaluation of a Technique of Discovering Xml Structures for Efficient Retrieval

نویسندگان

  • Hiroshi Ishikawa
  • Hajime Takekawa
  • Kaoru Katayama
چکیده

We propose an adaptable approach to discovery of database schemas for well-formed XML data such as EDI, news, and digital libraries, which we interchange, filter, or download for future retrieval and analysis. The generated schemas usually consist of more than one table. Our approach controls the number of tables to be divided by use of statistics of XML so that the total cost of processing queries is reduced. We generate schemas appropriate for complex data such as text formatting tags and child elements with the small maximum number of occurrences in order to reduce the number of tables. To this end, we introduce three functions NULL expectation, Large Leaf Fields, and Large Child Fields for controlling the number of tables to be divided. We described how to translate queries in XQuery into those in SQL. We also describe the concept of short paths contained by generated database schemas and their effects on the performance of query processing. We discuss when and how it is necessary to change the resultant database schemas in case of updates of the original XML data. We evaluate typical XML queries over the generated schemas and normalized schemas as another approach and measure and compare both of the costs in order to validate our approach.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prototyping a Vibrato-Aware Query-By-Humming (QBH) Music Information Retrieval System for Mobile Communication Devices: Case of Chromatic Harmonica

Background and Aim: The current research aims at prototyping query-by-humming music information retrieval systems for smart phones. Methods: This multi-method research follows simulation technique from mixed models of the operations research methodology, and the documentary research method, simultaneously. Two chromatic harmonica albums comprised the research population. To achieve the purpose ...

متن کامل

An Effective Path-aware Approach for Keyword Search over Data Graphs

Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...

متن کامل

Review of ranked-based and unranked-based metrics for determining the effectiveness of search engines

Purpose: Traditionally, there have many metrics for evaluating the search engine, nevertheless various researchers’ proposed new metrics in recent years. Aware of this new metrics is essential to conduct research on evaluation of the search engine field. So, the purpose of this study was to provide an analysis of important and new metrics for evaluating the search engines. Methodology: This is ...

متن کامل

Evaluation of Close-Range Photogrammetric Technique for Deformation Monitoring of Large-Scale Structures: A review

Close-range photogrammetry has been used in many applications in recent decades in various fields such as industry, cultural heritage, medicine and civil engineering. As an important tool for displacement measurement and deformation monitoring, close-range photogrammetry has generally been employed in industrial plants, quality control and accidents. Although close-range photogrammetric applica...

متن کامل

Structure- and Content-Based Retrieval for XML Documents

Copyright © 2001, Idea Group Publishing. ABSTRACT As the number of XML documents is dramatically increasing, it is necessary to develop an XML document retrieval system that can support both structurebased retrieval and content-based retrieval. In order to support the structurebased retrieval, we design four efficient index structures, i.e., keyword, structure, element and attribute index, by i...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007